Using randomization to create (nearly) identical groups

GVPT399F: Power, Politics, and Data

Our goal

Next best thing

Instead, we should try to make two groups that are as similar as possible to each other prior to treatment.

The magic of randomization

Perfectly random assignment does this very well!

Don’t take my word for it

Imagine we have a group of 1,000 individuals. We know the following about them:

  • Height

  • Weight

  • Eye colour

Our group

# A tibble: 1,000 × 4
      id height weight eye_colour
   <int>  <dbl>  <dbl> <chr>     
 1     1   175.   55.6 Blue      
 2     2   176.   86.6 Green     
 3     3   161.   56.1 Green     
 4     4   171.  106.  Blue      
 5     5   177.   86.1 Green     
 6     6   173.   88.8 Green     
 7     7   166.   82.0 Grey      
 8     8   179.   70.7 Brown     
 9     9   168.   91.8 Blue      
10    10   167.   76.6 Brown     
# ℹ 990 more rows

Random assignment

I’m now going to flip (an imaginary, computer-generated) coin for each of these 1,000 individuals to assigned them to group A or B:

# A tibble: 1,000 × 5
      id height weight eye_colour group
   <int>  <dbl>  <dbl> <chr>      <fct>
 1     1   175.   73.4 Blue       B    
 2     2   170.   80.2 Blue       A    
 3     3   176.   65.3 Blue       B    
 4     4   175.   65.4 Green      B    
 5     5   166.   55.2 Brown      A    
 6     6   162.   93.1 Green      B    
 7     7   169.   80.0 Green      A    
 8     8   171.   68.5 Brown      B    
 9     9   156.   73.6 Brown      A    
10    10   174.   64.6 Green      A    
# ℹ 990 more rows

How similar are these groups?

Let’s first check their heights:

ggplot(rand_group, aes(x = height, fill = group)) + 
  geom_density(alpha = 0.5) + 
  theme_minimal() + 
  labs(x = "Height (cm)",
       y = "Density",
       fill = "Group")

How similar are these groups?

And their weights:

ggplot(rand_group, aes(x = weight, fill = group)) + 
  geom_density(alpha = 0.5) + 
  theme_minimal() + 
  labs(x = "Weight (kg)",
       y = "Density",
       fill = "Group")

How similar are these groups?

And their eye colors:

rand_group |> 
  count(group, eye_colour) |> 
  ggplot(aes(x = n, y = reorder(eye_colour, n), fill = group)) + 
  geom_bar(position = "dodge", stat = "identity") + 
  labs(x = "Count",
       y = "Eye color",
       fill = "Group")

Making sure this wasn’t a fluke

Let’s re-run this:

# A tibble: 1,000 × 5
      id height weight eye_colour group
   <int>  <dbl>  <dbl> <chr>      <fct>
 1     1   159.   72.0 Blue       A    
 2     2   172.  100.  Brown      B    
 3     3   170.   88.6 Brown      A    
 4     4   176.   88.7 Blue       B    
 5     5   168.   86.8 Green      A    
 6     6   183.   80.6 Brown      B    
 7     7   163.   79.5 Brown      A    
 8     8   174.   64.3 Green      B    
 9     9   171.   80.2 Green      A    
10    10   169.   84.3 Blue       A    
# ℹ 990 more rows

Check how similar they are

library(patchwork)

p1 <- ggplot(rand_group, aes(x = height, fill = group)) + 
  geom_density(alpha = 0.5) + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Height (cm)",
       y = "Density",
       fill = "Group")

p2 <- ggplot(rand_group, aes(x = weight, fill = group)) + 
  geom_density(alpha = 0.5) + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Weight (kg)",
       y = "Density",
       fill = "Group")

p3 <- rand_group |> 
  count(group, eye_colour) |> 
  ggplot(aes(x = n, y = reorder(eye_colour, n), fill = group)) + 
  geom_bar(position = "dodge", stat = "identity") + 
  labs(x = "Count",
       y = "Eye color",
       fill = "Group")

p1 | p2 | p3

Increasingly similar

In fact, if we did this many, many, many times, these groups would be, on average, increasingly identical!

Why?

  • Central limit theorem

  • Law of large numbers